Deep neural networks (DNNs) are sensitive and susceptible to tiny perturbation by adversarial attacks which causes erroneous predictions. Various methods, including adversarial defense and uncertainty inference (UI), have been developed in recent years to overcome the adversarial attacks. In this paper, we propose a multi-head uncertainty inference (MH-UI) framework for detecting adversarial attack examples. We adopt a multi-head architecture with multiple prediction heads (i.e., classifiers) to obtain predictions from different depths in the DNNs and introduce shallow information for the UI. Using independent heads at different depths, the normalized predictions are assumed to follow the same Dirichlet distribution, and we estimate distribution parameter of it by moment matching. Cognitive uncertainty brought by the adversarial attacks will be reflected and amplified on the distribution. Experimental results show that the proposed MH-UI framework can outperform all the referred UI methods in the adversarial attack detection task with different settings.
translated by 谷歌翻译
We present Pre-trained Machine Reader (PMR), a novel method to retrofit Pre-trained Language Models (PLMs) into Machine Reading Comprehension (MRC) models without acquiring labeled data. PMR is capable of resolving the discrepancy between model pre-training and downstream fine-tuning of existing PLMs, and provides a unified solver for tackling various extraction tasks. To achieve this, we construct a large volume of general-purpose and high-quality MRC-style training data with the help of Wikipedia hyperlinks and design a Wiki Anchor Extraction task to guide the MRC-style pre-training process. Although conceptually simple, PMR is particularly effective in solving extraction tasks including Extractive Question Answering and Named Entity Recognition, where it shows tremendous improvements over previous approaches especially under low-resource settings. Moreover, viewing sequence classification task as a special case of extraction task in our MRC formulation, PMR is even capable to extract high-quality rationales to explain the classification process, providing more explainability of the predictions.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Most semantic communication systems leverage deep learning models to provide end-to-end transmission performance surpassing the established source and channel coding approaches. While, so far, research has mainly focused on architecture and model improvements, but such a model trained over a full dataset and ergodic channel responses is unlikely to be optimal for every test instance. Due to limitations on the model capacity and imperfect optimization and generalization, such learned models will be suboptimal especially when the testing data distribution or channel response is different from that in the training phase, as is likely to be the case in practice. To tackle this, in this paper, we propose a novel semantic communication paradigm by leveraging the deep learning model's overfitting property. Our model can for instance be updated after deployment, which can further lead to substantial gains in terms of the transmission rate-distortion (RD) performance. This new system is named adaptive semantic communication (ASC). In our ASC system, the ingredients of wireless transmitted stream include both the semantic representations of source data and the adapted decoder model parameters. Specifically, we take the overfitting concept to the extreme, proposing a series of ingenious methods to adapt the semantic codec or representations to an individual data or channel state instance. The whole ASC system design is formulated as an optimization problem whose goal is to minimize the loss function that is a tripartite tradeoff among the data rate, model rate, and distortion terms. The experiments (including user study) verify the effectiveness and efficiency of our ASC system. Notably, the substantial gain of our overfitted coding paradigm can catalyze semantic communication upgrading to a new era.
translated by 谷歌翻译
Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
时间序列数据出现在各种应用程序中,例如智能运输和环境监测。时间序列分析的基本问题之一是时间序列预测。尽管最近的深度时间序列预测方法取得了成功,但它们仍需要足够的历史价值观察才能进行准确的预测。换句话说,输出长度(或预测范围)与输入和输出长度之和的比率应足够低(例如,0.3)。随着比率的增加(例如,到0.8),预测准确性的不确定性显着增加。在本文中,我们从理论和经验上都表明,通过将相关时间序列检索作为参考文献可以有效地降低不确定性。在理论分析中,我们首先量化不确定性,并显示其与平方误差(MSE)的连接。然后,我们证明,带有参考的模型比没有参考的模型更容易学习,因为检索到的参考可能会降低不确定性。为了凭经验证明基于检索的时间序列预测模型的有效性,我们引入了一种简单而有效的两阶段方法,称为“保留”,该方法由关系检索和内容合成组成。我们还表明,可以轻松地适应时空时间序列和时间序列插补设置。最后,我们评估了现实世界数据集上的延迟,以证明其有效性。
translated by 谷歌翻译
非平行的多域语音转换方法(例如Stargan-VC)在许多情况下已被广泛应用。但是,这些模型的培训通常由于其复杂的对抗网络体系结构而构成挑战。为了解决这个问题,在这项工作中,我们利用最先进的对比学习技术,并将有效的暹罗网络结构纳入Stargan歧视者。我们的方法称为Simsiam-Stargan-VC,它提高了训练稳定性,并有效地防止了训练过程中的歧视者过度拟合问题。我们对语音转换挑战(VCC 2018)数据集进行了实验,并进行了用户研究,以验证我们的框架性能。我们的实验结果表明,Simsiam-Stargan-VC在客观和主观指标方面显着优于现有的Stargan-VC方法。
translated by 谷歌翻译
聊天机器人用于许多应用程序中,例如自动化代理,智能家庭助理,在线游戏中的互动角色等。因此,确保他们不会以不希望的方式行事,对用户提供令人反感或有毒的反应。这并不是一项琐碎的任务,因为最先进的聊天机器人模型是在从互联网公开收集的大型公共数据集上培训的。本文提出了对聊天机器人中毒性的首次大规模测量。我们表明,公开可用的聊天机器人很容易在喂养有毒的查询时提供有毒的反应。更令人担忧的是,一些无毒的查询也会触发有毒反应。然后,我们着手设计和实验攻击,即毒性,该攻击依赖于微调的GPT-2来产生无毒的查询,使聊天机器人以有毒的方式做出反应。我们广泛的实验评估表明,我们的攻击对公共聊天机器人模型有效,并且优于先前工作提出的手动制作的恶意查询。我们还评估了针对毒性的三种防御机制,表明它们要么以影响聊天机器人的效用而降低攻击性能,要么仅有效地减轻了一部分攻击。这强调了对计算机安全和在线安全社区进行更多研究的需求,以确保聊天机器人模型不会伤害其用户。总体而言,我们有信心有毒可以用作审计工具,我们的工作将为设计更有效的聊天机器人安全防御措施铺平道路。
translated by 谷歌翻译
很少有学习模型学习人类注释有限,而这种学习范式在各种任务中证明了实用性数据使该模型无法充分探索语义信息。为了解决这个问题,我们将知识蒸馏引入了几个弹出的对象检测学习范式。我们进一步进行了激励实验,该实验表明,在知识蒸馏的过程中,教师模型的经验误差将少数拍物对象检测模型的预测性能(作为学生)退化。为了了解这种现象背后的原因,我们从因果理论的角度重新审视了几个对象检测任务上知识蒸馏的学习范式,并因此发展了一个结构性因果模型。遵循理论指导,我们建议使用基于后门调整的知识蒸馏方法,用于少数拍物检测任务,即Disentangle和Remerge(D&R),以对相应的结构性因果模型进行有条件的因果干预。从理论上讲,我们为后门标准提供了扩展的定义,即一般后门路径,可以在特定情况下扩展后门标准的理论应用边界。从经验上讲,多个基准数据集上的实验表明,D&R可以在几个射击对象检测中产生显着的性能提升。
translated by 谷歌翻译
深度神经网络的鲁棒性对于现代AI支持系统至关重要,应正式验证。在广泛的应用中采用了类似乙状结肠的神经网络。由于它们的非线性,通常会过度评估乙状结肠样激活功能,以进行有效的验证,这不可避免地引入了不精确度。已大量的努力致力于找到所谓的更紧密的近似值,以获得更精确的验证结果。但是,现有的紧密定义是启发式的,缺乏理论基础。我们对现有神经元的紧密表征进行了彻底的经验分析,并揭示它们仅在特定的神经网络上是优越的。然后,我们将网络紧密度的概念介绍为统一的紧密度定义,并表明计算网络紧密度是一个复杂的非convex优化问题。我们通过两个有效的,最紧密的近似值从不同的角度绕过复杂性。结果表明,我们在艺术状态下的方法实现了有希望的表现:(i)达到高达251.28%的改善,以提高认证的较低鲁棒性界限; (ii)在卷积网络上表现出更为精确的验证结果。
translated by 谷歌翻译